feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking by christso · Pull Request #442 · EntityProcess/agentv

christso · 2026-03-06T05:13:46Z

Summary

Implements three follow-up features from #431 (execution status classification):

--retry-errors <jsonl> (Add --retry-errors CLI flag to re-run only execution errors #433): Re-run only execution_error test cases from a previous output. Non-error results are preserved and merged into the new output.
execution.fail_on_error (Add fail_on_error tolerance config for eval runs #434): Configurable error tolerance — true (halt on first error), false (never halt, default), or 0.0–1.0 threshold ratio.
errorRetries (Track retried transient errors in eval results for diagnostics #435): Track transient errors (timeouts) that were retried during provider invocation, attached to the final EvaluationResult.

Closes #433, closes #434, closes #435

Test plan

Unit tests for loadErrorTestIds / loadNonErrorResults (5 tests)
Unit tests for extractFailOnError config parser (9 tests)
Integration tests for fail_on_error orchestrator behavior (3 tests: true, threshold, false)
Integration tests for errorRetries tracking (2 tests: with/without retries)
Zod schema sync test passes (eval-schema.json matches generated schema)
Full test suite: 1,080 tests, 0 failures
Build, typecheck, lint all pass

🤖 Generated with Claude Code

…y tracking (#433, #434, #435) Implements three follow-up features from #431 execution status classification: - --retry-errors <jsonl>: re-run only execution_error test cases from a previous output - execution.fail_on_error config: true (halt on first), false (never halt), or 0.0-1.0 threshold - errorRetries field on EvaluationResult to track transient errors retried during provider invocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Updates eval-schema.json, SKILL.md, running-evals.mdx, and eval-files.mdx with documentation for the three new features from #433, #434, #435. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The eval-schema-sync test requires the Zod schema to be the source of truth. Adds FailOnErrorSchema to ExecutionSchema and regenerates the JSON schema to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rewrite threshold test to exercise actual ratio math (succeed → succeed → fail → fail → fail triggers halt at 3/5=0.60 > 0.5) - Fix docs range notation from 0.0-1.0 to >0.0-1.0 (exclusive of 0) - Add concurrency best-effort note to docs - Add comment explaining why 0 is excluded from numeric thresholds - Add lightweight validation (testId + score) in loadNonErrorResults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-03-06T05:14:38Z

Deploying agentv with Cloudflare Pages

Latest commit:	`4ea4508`
Status:	✅ Deploy successful!
Preview URL:	https://42e50489.agentv.pages.dev
Branch Preview URL:	https://feat-follow-up-431-error-tra.agentv.pages.dev

View logs

…old) Align with industry standards (promptfoo, braintrust) by keeping fail_on_error as a simple true/false toggle. The numeric ratio threshold (0.0-1.0) was YAGNI — post-hoc analysis of JSONL output is sufficient for error ratio decisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e to CLAUDE.md Remove ErrorRetry interface, errorRetries field on EvaluationResult, and retry tracking code — no industry precedent, and retry count can be added later if needed. Add YAGNI as design principle #4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso and others added 6 commits March 6, 2026 04:01

test: add unit tests for extractFailOnError config parser

b5e89e3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add retry-errors, fail_on_error, and errorRetries documentation

d6e7e02

Updates eval-schema.json, SKILL.md, running-evals.mdx, and eval-files.mdx with documentation for the three new features from #433, #434, #435. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add fail_on_error to Zod schema and regenerate eval-schema.json

5b1a1d2

The eval-schema-sync test requires the Zod schema to be the source of truth. Adds FailOnErrorSchema to ExecutionSchema and regenerates the JSON schema to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: format eval-schema.json with Biome

6a85c9d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso closed this Mar 6, 2026

christso deleted the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 05:28

christso restored the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 05:33

christso reopened this Mar 6, 2026

christso and others added 4 commits March 6, 2026 06:09

style: fix Biome formatting in config-loader

c0ded63

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix trailing blank lines

4ea4508

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso merged commit 6f64eb4 into main Mar 6, 2026
1 check passed

christso deleted the feat/follow-up-431-error-tracking-tolerance-retry branch March 6, 2026 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking#442

feat(eval): add retry-errors, fail_on_error tolerance, and error retry tracking#442
christso merged 10 commits intomainfrom
feat/follow-up-431-error-tracking-tolerance-retry

christso commented Mar 6, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 6, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 6, 2026 •

edited

Loading